gensim word2vec demo

2017年7月24日2017年7月24日 fendouai

gensim word2vec demo

import gensim
sentences = [['first', 'sentence'], ['second', 'sentence']]
# train word2vec on the two sentences
model = gensim.models.Word2Vec(sentences, min_count=1)
print(model.wv.vocab)

documents = ["Human machine interface for lab abc computer applications",
             "A survey of user opinion of computer system response time",
             "The EPS user interface management system",
             "System and human system engineering testing of EPS",
             "Relation of user perceived response time to error measurement",
             "The generation of random binary unordered trees",
             "The intersection graph of paths in trees",
             "Graph minors IV Widths of trees and well quasi ordering",
             "Graph minors A survey"]

# remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

print(texts)
# remove words that appear only once
from collections import defaultdict
frequency = defaultdict(int)
for text in texts:
    for token in text:
        frequency[token] += 1

texts = [[token for token in text if frequency[token] > 1] for text in texts]

from pprint import pprint  # pretty-printer
#pprint(texts)
# build the same model, making the 2 steps explicit
new_model = gensim.models.Word2Vec(min_count=1)  # an empty model, no training
new_model.build_vocab(texts)                 # can be a non-repeatable, 1-pass generator
new_model.train(texts, total_examples=new_model.corpus_count, epochs=new_model.iter)
print(new_model["computer"])

2018年4月16日 fendouai 0

中文 NLP 词法、句法、语义、语篇综合系列

好文推荐 NLP+词法系列（一）...

gensim Keras 自然语言处理
2018年4月2日 fendouai 0

TensorFlow 官方开源用于寻找系外行星的代码

在上周六的 2018 Tenso...

gensim TensorFlow TensorFlowNews TensorFlow文档计算机视觉
2018年3月31日 fendouai 0

中文自然语言处理工具集：分词，相似度匹配

分词工具结巴分词 https:...

gensim 自然语言处理

Leave a Comment 取消回复